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Abstract 

In  a  vision-based  automatic  agricultural  vehicle  guidance  system  for  row-crop  applications,  finding 
guidance  information  from  crop  row  structure  is  the  key  in  achieving  accurate  control  of  the  vehicle. 
This  paper  describes  a  robust  procedure  to  obtain  a  guidance  directrix.  The  procedure  includes  row  seg¬ 
mentation  by  A^-means  clustering  algorithm,  row  detection  by  a  moment  algorithm,  and  guidance  line 
selection  by  a  cost  function.  Auxiliary  information,  such  as  the  known  crop  row  spacing,  is  used  to  aid 
in  the  development  of  the  guidance  directrix.  Two  image  data  sets,  one  taken  from  a  soybean  field  and 
the  other  taken  from  a  corn  field,  were  used  to  evaluate  the  accuracy  of  the  proposed  image  processing 
procedure.  The  average  RMS  offset  error  from  30  soybean  images  was  1 .0  cm  with  an  average  cost  of 
4.99.  In  contrast,  the  average  RMS  offset  error  from  1 5  corn  images  was  2.4  cm  with  an  average  cost  of 
7.27.  The  proposed  image  processing  procedure  was  implemented  on  a  vision-based  guidance  tractor. 
©  2004  Elsevier  B.V.  All  rights  reserved. 
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1.  Introduction 

Potential  benefits  of  automated  agricultural  vehicles  include  increased  productivity,  in¬ 
creased  application  accuracy,  and  enhanced  operation  safety.  The  first  stage  development 
in  agricultural  vehicle  automation,  automatic  vehicle  guidance,  has  been  studied  for  many 
years.  A  number  of  innovations  were  explored  as  early  as  1920s  (Willrodt,  1924;  Sissons, 
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1939;  Schafer  and  Young,  1979).  The  earlier  guidance  systems  demonstrated  their  technical 
feasibility,  but  commercialization  was  unsuccessful  mainly  due  to  the  high  cost  of  those 
systems. 

The  rapid  advancement  in  electronics,  computers,  and  computing  technologies  has  in¬ 
spired  renewed  interests  in  the  development  of  vehicle  guidance  systems.  Various  guidance 
technologies,  including  mechanical  guidance,  optical  guidance,  radio  navigation,  and  ul¬ 
trasonic  guidance,  have  been  investigated  (Reid  et  al.,  2000;  Tillett,  1991).  In  general,  any 
guidance  technology  will  need  to  provide  positioning  information  of  the  vehicle,  referenced 
either  in  a  global  coordinate  system  or  in  a  local  coordinate  system. 

Since  early  1990s,  Global  Positioning  System  (GPS)  receivers  have  been  widely  used  as 
global  guidance  sensors  (Bell,  2000;  Larsen  et  al.,  1994;  Yukumoto  et  al.,  2000).  GPS-based 
guidance  technology  can  be  used  for  many  field  operations  such  as  tillage,  planting,  culti¬ 
vating,  and  harvesting.  It  has  the  potential  to  achieve  completely  autonomous  navigation. 
However,  the  high  cost  of  precision  GPS  receivers  is  the  major  obstacle  for  their  widespread 
use  in  agricultural  vehicle  navigation. 

Machine  vision  technology  can  be  used  to  automatically  guide  a  vehicle  when  crop  row 
structure  is  distinguishable  in  a  field.  Typical  applications  include  guiding  a  tractor  for 
row-crop  cultivation,  or  guiding  a  combine  for  harvest  operation.  The  guidance  sensor,  i.e., 
the  camera,  is  a  local  sensor  because  only  the  relative  location  of  the  vehicle,  with  respect  to 
the  crop  rows,  can  be  determined.  Machine  vision  guidance  has  the  advantage  of  using  local 
features  to  fine-tune  the  vehicle  navigation  course.  It  has  the  technological  characteristics 
closely  resembling  those  possessed  by  a  human  operator,  and  thus  has  great  potential  for 
implementation  of  a  vehicle  guidance  system  (Wilson,  2000). 

In  a  vision-based  vehicle  guidance  system,  finding  guidance  information  from  crop  row 
structure  is  the  key  in  achieving  accurate  control  of  the  vehicle.  A  number  of  image  process¬ 
ing  techniques  have  been  investigated  to  find  the  guidance  course  (directrix)  from  row-crop 
images.  As  examples,  Reid  et  al.  (1985)  developed  a  binary  thresholding  strategy  using  a 
Bayes  classification  technique  to  effectively  and  accurately  segment  crop  canopy  and  soil 
background  for  cotton  crop  at  different  growth  stages.  Gerrish  et  al.  (1985)  concluded  in 
their  study  that  thresholded  intensity  images  alone  will  not  work  in  all  cases,  and  they 
showed  that  the  combination  of  noise  filtering,  edge  detection,  thresholding,  and  re-scaling 
was  the  most  promise  technique.  Image  analysis  using  Hough  transform  to  find  crop  rows 
was  reported  in  several  studies  (Marchant  and  Brivot,  1995;  Marchant,  1996). 

Only  a  few  vision  guidance  systems  have  been  successfully  developed  and  tested  in 
field  trials.  Billingsley  and  Schoenfisch  (1997)  reported  a  vision  guidance  system  that  is 
relatively  insensitive  to  additional  visual  ‘noise’  from  weeds,  while  tolerating  the  fading 
out  of  one  or  more  rows  in  a  barren  patch  of  the  field.  They  showed  that  their  system  is 
capable  of  maintaining  an  accuracy  of  2  cm.  Since  1997,  the  University  of  Illinois  research 
team  has  been  working  with  industry  partners  to  develop  an  automatic  vision  guidance 
tractor.  The  work  has  resulted  in  several  US  patents  that  disclose  the  guidance  system. 
This  paper  describes  the  procedure  used  in  the  guidance  system  for  obtaining  the  guidance 
directrix  from  row-crop  images.  The  main  objective  of  the  reported  research  was  to  develop 
an  image  processing  procedure  that  can  be  implemented  in  a  vision-based  guidance  tractor 
in  real-time  with  adequate  accuracy.  Examples  are  given  to  illustrate  the  robustness  of  the 
proposed  image  processing  procedure. 
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2.  Overview  of  the  vision-based  guidance  system 

The  vision-based  guidance  system  developed  at  the  University  of  Illinois  includes  a  vision 
sensor,  an  image  processor,  a  DGPS  receiver,  a  navigation  planner,  a  steering  controller, 
and  a  steering  actuator  (Fig.  1).  The  vision  sensor  is  a  Cohu  2100  series  monochrome 
CCD  camera  (Cohu,  Inc.,  San  Diego,  California).  The  camera  has  a  fixed  gain  with  an  NIR 
bandpass  filter  at  800  nm.  It  acquires  a  scene  image  in  the  front  of  the  vehicle.  The  use  of 
the  NIR  filter  can  enhance  the  intensity  (brightness)  of  crop  pixels  in  the  scene  image.  The 
image  processor  processes  the  acquired  image  to  obtain  a  guidance  directrix.  The  guidance 
directrix,  along  with  DGPS  positional  information  and  other  tractor  states,  is  utilized  in  the 
navigation  planner  to  determine  a  navigation  command,  i.e.,  the  commanding  wheel  angle. 
Based  on  the  difference  between  the  commanding  and  the  measured  wheel  angles,  the 
steering  controller  computes  an  implementing  steering  control  signal  and  sends  it  to  drive 
the  steering  actuator.  The  steering  actuator  is  an  electrohydraulic  (E/H)  steering  system  that 
controls  the  turning  rate  of  vehicle  wheels.  After  the  vehicle  moves  a  short  distance,  the 
next  loop  starts  with  the  vision  sensor  acquiring  a  new  image. 

A  closed-loop  control  strategy  is  used  in  the  steering  controller,  with  the  measured  wheel 
angle  as  the  feedback  information.  The  control  loop  is  executed  at  a  faster  rate  than  image 
processing  loop  to  achieve  a  better  vehicle  control.  Several  steering  controllers,  including 
PID  controller,  feed-forward  PID  (FPID)  controller,  and  fuzzy  logic  (FL)  controller,  have 
been  developed  and  implemented  in  the  guidance  system  (Qiu  et  al.,  1999;  Wu  et  al.,  1999; 
Zhang,  1999). 

As  a  linkage  between  the  image  processor  and  the  steering  controller,  the  navigation 
planner  plays  a  key  role  in  the  guidance  system.  Development  of  the  navigation  planner  has 
been  discussed  in  Han  et  al.  (2001).  The  image  processor  in  Fig.  1  is  critical  for  the  success 


Fig.  1.  Components  of  the  vision-based  guidance  system  developed  at  the  University  of  Illinois. 
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of  the  vision-based  guidance  system.  Development  of  the  image  processor  to  obtain  the 
guidance  directrix  is  discussed  in  the  next  section. 


3.  Development  of  the  guidance  directrix 

To  achieve  high  image  processing  speed,  only  a  part  of  the  entire  scene  image,  called 
region-of-interest  (ROI),  is  processed  (Fig.  2).  The  ROI  contains  an  upper  region  and  a  lower 
region,  each  includes  two  rows  of  crop  to  be  tracked.  A  conventional  X-means  clustering 
algorithm  is  used  to  compute  a  threshold  in  the  ROI.  The  threshold  value  depends  on  the 
pixel  intensity  distribution  in  the  ROI  and  is  used  to  identify  pixels  that  represent  crop  or 
vegetation.  Image  segmentation  is  performed  in  four  tracking  windows  within  the  ROI. 
A  linear  regression  algorithm  is  applied  in  each  tracking  window  to  obtain  a  navigation 
line  that  represents  the  center  of  the  crop  row.  Finally,  all  navigation  lines  in  the  image  are 
combined  to  obtain  a  reliable  guidance  directrix.  The  procedure  is  described  in  the  following 
paragraphs  in  details. 

3.1.  Row  segmentation  by  K-means  clustering 

The  image  taken  by  the  monochrome  CCD  camera  is  digitized  using  a  frame  grabber 
PXC200  (CyberOptics  Corporation,  Portland,  Oregon)  into  a  two-dimensional  array  of 
pixels,  with  each  pixel  represented  by  a  gray  level  (GL)  between  0  and  255.  In  a  typical 
image  scene  (Fig.  2),  the  brighter  pixels  with  higher  GL  represent  crop,  and  the  darker 
pixels  with  lower  GL  represent  non-crop  (e.g.,  soil).  Image  thresholding  is  the  first  step  to 
convert  the  gray  scale  image  into  a  binary  image  so  that  crop  pixels  are  separated  from  the 
background. 

If  we  construct  a  histogram  of  the  pixel  GL  values,  the  pixels  representing  crop  or  vege¬ 
tation  would  appear  at  the  histogram’s  right  side.  The  objective  of  image  thresholding  is  to 
find  a  GL  value  (threshold)  in  the  histogram  such  that  any  pixel  with  a  higher  GL  value  than 
that  threshold  would  represent  a  crop  pixel.  A  conventional  X-means  clustering  algorithm 
is  used  to  compute  a  threshold  in  the  ROI. 


ROI  for  Thresholding 

Window  for  Segmentation 


Navigation  Line 


Fig.  2.  Region-of-interest  (ROI)  for  image  thresholding,  windows  for  image  segmentation,  and  navigation  lines 
for  guidance. 
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/Gmeans  clustering  is  a  partitioning  method  for  grouping  objects  so  that  the  within-group 
variance  is  minimized.  By  minimizing  dissimilarity  of  each  subset  locally,  the  algorithm 
will  globally  yield  an  optimal  dissimilarity  of  all  subsets.  The  algorithm,  as  applied  to  image 
thresholding,  is  given  by  the  following  steps: 


1 .  Initialize  the  ( K)  class  centers.  For  simplicity,  an  equal-distance  method  is  used  to  define 
the  initial  class  centers: 

Center?  =  GLm,  +  (■  -!)  0U“  ~  GL"1'" .  i=  1,2 . K  (1) 


where  Center^  is  the  initial  class  center  for  the  ith  class,  GLmax  and  GLmin  are  the 
maximum  and  minimum  GL  values  in  the  sample  space,  respectively. 

2.  Assign  each  point  to  its  closest  class  center.  The  criterion  to  assign  a  point  to  a  class  is 
based  on  the  Euclidean  distance  in  the  feature  (GL)  space  using: 

Distance;  j  =  abs(GLy  —  Center;),  i  =  1,  2,  ...,  K\  j  =  1,  2, ...,  A  (2) 

where  Distance;,  j  is  the  distance  from  the  jth  point  to  the  ith  class.  A  is  the  total  number 
of  points  in  the  sample  space. 

3.  Calculate  the  ( K)  new  class  centers  from  the  mean  of  the  points  that  are  assigned  to  it. 
The  new  class  centers  are  calculated  by 


where  A;  is  the  total  number  of  points  that  are  assigned  to  the  ith  class  in  step  2. 

4.  Repeat  from  2,  if  any  class  center  has  moved  or  if  a  threshold  of  movement  has  not  been 
obtained.  The  criterion  for  measuring  the  change  of  the  class  centers  is 


K 


Measure  = 

\ 


^(Center-  —  Center"1  1)2, 

i=\ 


i  =  1,2,  ...K 


(4) 


where  Center)"  and  Center)"-1  are  the  new  (or  the  current)  ith  class  center  and  the  old 
(or  the  previous)  ith  class  center,  respectively. 

5.  The  threshold  value  is  defined  as  the  average  of  the  Kth  class  center  and  the  ( K  —  l)th 
class  center: 


Threshold  =  \  (Center^  +  Center^- 1) 


(5) 


As  an  example,  the  gray-scale  image  in  Fig.  2  was  sampled  every  pixel  to  obtain  a 
histogram  of  the  GL  values  (Fig.  3a).  The  minimum  GL  value  was  13,  and  the  maximum 
GL  value  was  238.  Three  classes  were  selected  in  this  example  ( K  =  3).  The  initial  class 
centers  were  (50,  125,  200).  After  eight  iterations  as  described  above,  the  new  class  centers 
were  found  as  (63,  120,  171)  (Fig.  3b).  The  threshold  was  selected  as  146  (Eq.  (5)).  Using 
this  threshold  value,  a  segmented  binary  image  was  obtained  (Fig.  4). 


Frequency  (%)  Frequency  (%) 
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(a).  Initial  K  (K=3)  classes. 

Threshold  =  146 

2.5 

2.0 

1.5 
1.0 
0.5 
0.0 

0  25  50  75  100  125  150  175  200  225  250 

Pixel  Value 

(b).  Class  centers  after  segmentation. 

Fig.  3.  ^T-means  clustering  method  to  find  a  threshold. 


Fig.  4.  Row  detection  from  the  segmented  binary  image. 
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3.2.  Row  detection  by  a  moment  algorithm 


Four  tracking  windows  within  the  ROI  are  selected,  and  each  window  keeps  track  of 
a  segment  of  one  crop  row  (Fig.  4).  Within  each  tracking  window,  the  crop  row  can  be 
approximated  by  a  straight  line  with  a  centroid  point,  (*o,  yoX  and  an  orientation  angle,  0. 
The  location  of  the  centroid  point  within  a  window  is  determined  by 


*o  = 


yo  = 


Eggggv  X  GUrn.nt 


(6) 


where  GL(m,  n)  is  the  GL  value  of  the  vegetation  pixel  at  location  (m,  n)  in  the  binary 
image,  Ay  is  the  height  (in  pixel)  of  the  tracking  window,  xl  (n)  and  x2 (n)  are  the  left  and 
right  interception  points  of  the  tracking  window  with  a  horizontal  line  y  =  n,  respectively. 
Note  that  the  operation  is  performed  inside  the  tracking  window  with  background  pixels 
excluded.  Since  the  tracking  window  is  an  irregular  rectangle,  both  xl  (n)  and  x2(n)  depend 
on  the  index  n. 

The  orientation  angle,  0 ,  is  determined  by  a  moment  algorithm: 

Q  t  -l  \  -(M20  -  M02 )  ±  V(M2 0  -  M02)2  -  4Mu  1 

0  =  tan  1  { - >  (7) 

2MU  J 


where 

M20  =  T,nloT,m=xl{n)(m  ~  xof  X  GL(l»,  «) 

M02  =  EnloEX^m(n  -  yo7  X  GL (m,  n)  (8) 

Mil  =  E^=0£m="Wm  _  X°)(n  _  ^0)  X  GL(m’ 

Eqs.  (6)-(8)  are  applied  to  each  of  the  four  tracking  windows  to  obtain  four  navigation  lines. 
The  combination  of  these  navigation  lines  determines  a  guidance  directrix. 

3.3.  Navigation  line  selection  by  a  cost  function 


The  quality  of  a  navigation  line  is  measured  by  a  cost  function  defined  below: 
M20 


cos  t  = 


EtloZZLUfiUm.n) 


(9) 


where  M20  is  defined  in  Eq.  (8). 

The  cost  function  is  essentially  a  second  moment  about  the  y-axis.  Greater  scattering  of 
vegetation  pixels  results  in  a  higher  cost  value.  Conversely,  less  scatter  (a  better  quality 
navigation  line)  results  in  a  lower  cost  value.  Cost  functions  are  applied  to  the  upper  and 
lower  regions  in  the  ROI  (Fig.  2)  to  determine  whether  the  navigation  lines  from  Eqs.  (6)-(8) 
are  directly  accepted,  or  need  to  be  further  processed  by  the  following  criteria: 
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Fig.  5.  Calculated  navigation  line  from  the  known  row  spacing. 

1 .  If  the  costs  of  the  navigation  line  in  both  the  left  and  the  right  windows  are  acceptable, 
both  navigation  lines  are  accepted. 

2.  If  the  cost  of  the  navigation  line  in  the  left  window  is  acceptable  and  the  cost  of  the 
navigation  line  in  the  right  window  is  not  acceptable,  the  left  navigation  line  is  accepted 
and  the  right  navigation  line  is  calculated. 

3.  If  the  cost  of  the  navigation  line  in  the  right  window  is  acceptable  and  the  cost  of  the 
navigation  line  in  the  left  window  is  not  acceptable,  the  right  navigation  line  is  accepted 
and  the  left  navigation  line  is  calculated. 

4.  If  the  costs  of  the  navigation  line  in  both  the  left  and  the  right  windows  are  not  acceptable, 
the  navigation  lines  determined  from  the  previous  image(s)  are  used. 

The  cost  threshold  to  accept  or  reject  a  navigation  line  was  determined  a  priori  by  visual 
examination.  Typically,  a  cost  of  3  or  less  indicates  a  good  navigation  line,  but  a  navigation 
line  with  a  cost  between  3  and  8  is  also  acceptable.  When  the  cost  value  is  above  9,  the 
navigation  line  is  considered  as  unacceptable. 

The  top  points  on  navigation  lines  in  the  left  and  the  right  windows  should  be  separated 
by  a  known  crop  row  spacing.  The  same  should  be  true  for  the  two  bottom  points  on  those 
navigation  lines.  Mathematically,  these  conditions  are  expressed  as 

x top, right  =  x top,  left  T"  2\t0p 
^top, right  =  y  top,  left 
•^bot, right  —  -^botjeft  T  ^bottom 
^bot,  right  —  ybot,left 

where  Z\top  and  A  bottom  are  the  distances  between  two  crop  rows  in  the  image  space  (Fig.  5). 
Because  of  the  perspective  view  of  the  scene,  At0 p  and  ^bottom  are  not  the  same  in  the  image 
space.  Eq.  (10)  is  used  to  obtain  the  two  end  points  on  the  calculated  navigation  line  under 
condition  (2),  (3),  or  (4)  above.  The  centroid  point,  (*o,  yo)>  and  the  orientation  angle,  0 ,  of 
the  calculated  navigation  line  can  be  further  obtained  from  Eq.  (10). 

Many  factors,  such  as  poor  crop  development,  bare  spots,  or  shadows,  often  make  it 
difficult  to  recognize  crop  rows  correctly  in  one  or  more  tracking  windows.  Using  a  cost 
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l: nd  points  on  navigation  lines 


A  (Xl,lop,Yl,lop) 
B  (Xl,bot  ,Y l,bot) 
C  (X2,top  .Y2,top) 
D  (X2,bot  ,Y2,bol) 
E  (X3,top  ,  Y 3,top) 
F  (X3,bot  ,Y3,bot) 
G  (X4,top  ,Y4.top) 
H  (X4,bot  ,Y4,bot) 


Trajectory  matrix  points 

P,  (x„y,) 

Pi  (x„y,) 

P,  (x„y,) 

P4  (x„y,) 

P5  (x„y,) 


Fig.  6.  Relative  positions  of  navigation  lines  in  the  tracking  windows. 

function  is  an  effective  way  to  identify  poor-quality  navigation  lines  in  tracking  windows, 
and  to  replace  these  lines  with  calculated  lines  using  auxiliary  information — a  known  crop 
row  spacing  in  this  case.  It  is  obvious  that  using  multiple  tracking  windows  can  improve 
the  robustness  of  the  image  processing. 


3.4.  Improving  row  tracking  reliability 


In  addition  to  the  cost  function,  another  quality  measure  for  navigation  lines  in  tracking 
windows  is  the  relative  positions  among  these  lines.  As  shown  in  Fig.  6,  each  navigation 
line  can  be  represented  by  two  end  points,  (X?t0p,  Yi, top)  and  (Xit bot>  ^',bot)>  *  =  1,  2,  3,  4. 
In  an  ideal  situation,  the  distances  between  A  and  C,  B  and  D,  E  and  G,  F  and  H,  when 
transformed  to  the  vehicle  (ground)  coordinate  system,  should  be  close  to  a  known  row 
spacing.  Due  to  image  processing  error,  relative  positions  among  the  navigation  lines  may 
not  be  perfect.  The  following  criteria  are  used  to  adjust  navigation  lines: 


1 .  All  the  tracking  windows  are  good,  i.i 

{T(X 2, top)  —  r(XUop)}  >  ‘Smin 
{T(X2?bot)  —  ^(Xi5bot)}  >  ^min 
{T(X 4?top)  —  ^(X3?top)}  >  ‘S'min 
WX4,bot)  _  r(*3,bot)}  >  ‘Smin 
{nX2,top)  —  ^(X3?top)}  >  ^min 
{7X^2, bot)  —  ^(X3?bot)}  >  ^min 
{T(X4  top)  —  T(Xi?top)}  >  ^min 

W^bot)  —  r(Xi,bot)}  >  ^min 


and  {r(X2,top)  —  7X^1, top)}  <  ‘Smax 
and  {T(X2?bot)  —  ^(Xibot)}  <  ^max 
and  |T(X4  top)  —  T^X^top)}  <  ^max 
and  {T(X4  bot)  —  ^(X^  bot)}  ^  ^max 
and  {r(X2,top)  —  ^(2^3,  top)}  <  ^max 
and  (T(X2,bot)  —  ^(2f3,bot)}  <  ‘Smax 
and  (T(X4  top)  —  T(Xptop)}  <  ^max 
and  {r(X4,bot)  —  7XXi(bot)}  <  ^max 


(11) 


where  T{  }  is  the  coordinate  transformation  from  image  space  to  vehicle  space,  Sm[n  is 
the  minimum  acceptable  row  spacing,  Smax  is  the  maximum  acceptable  row  spacing. 
Smm  =  (So  —  A S)  and  SmSLX  =  (So  +  A S).  So  is  the  ideal  row  spacing,  and  AS  is  the 
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acceptable  error  in  row  spacing.  No  adjustment  is  needed  for  the  navigation  lines  in  this 
case. 


2.  Top-left  tracking  window  is  no  good,  i.e., 

W*2,top)  —  n*l,top)}  <  ^min 
{T{X 2?bot)  —  ^(^l,bot)}  <  ^min 
{T(X<\ ?top)  —  ^(^3, top)}  >  ^min  and 
{7\*4,bot)  —  ^(^3,bot)}  >  ^min  and 
{7^*2, top)  —  ^(^3, top)}  ^  Smin  and 
{7\*2,bot)  —  ^(^3,bot)}  ^  Smin  and 


{71*2,  top)  -  7X^1, top)}  >  ‘Sini\ 

{T{X2, bot)  -  T(XlM>l)\  >  ■SlUi\ 

{r(x4,top)  -  T(x3,top)}  <  ^max 

{r(x4,bot)  -  nx3,  bot)}  <  ^max 

{^2, top)  -  T(X 3, top)}  <:'  5max 
{7’(Z2,bot)  -  ^.bot)}  <  ^max 


The  navigation  line  in  the  top-left  tracking  window  is  reset  by  Eq.  (10). 
Top-right  tracking  window  is  no  good,  i.e., 


{  71*2, top)  _  ^(X^top)}  <  ^min 
{  7^*2, bot)  —  T(X\ ?bot)}  <  ^min 
{T{X 4?top)  —  ^(X3  top)}  >  ^min  and 

{7X*4,bot)  —  T(2f3  b0t)}  >  Sm[n  and 

{7\*4,top)  —  ^(Xptop)}  >  ^min  and 
{7\*4,bot)  —  T(2fi  b0t)}  >  5min  and 


{7T*2,toP)  -  T(X i,top)}  >  *^max 

{T(X2t bot)  -  T(XUbol)\  :>  -^[lla\ 

mx4,top)  -  nxs.top)}  <  ^max 
{r(x4,bot)  -  7’(x3,bot)}  <  ^max 
{r(x4,top)  -  r(Xi,top)} 

<  ^max 

mx4,  bot)  -  t(x  i,bot)}  <  ^max 


The  navigation  line  in  the  top-right  tracking  window  is  reset  by  Eq.  (10). 
4.  Bottom-left  tracking  window  is  no  good,  i.e., 


{T{X 2, top)  —  T(X\ ,top)}  >  ^min  and 
{7\*2, bot)  —  ^(2fi  b0t)}  >  Sm[n  and 

{7\*4,top)  —  ^(X3  top)}  <  ^min 
{7\*4 ?bot)  —  T(X2 5bot)}  <  ^min 
{T(X, 4?t0p)  —  T{X\ ?t0p)}  >  ‘S’min  and 

W*4,bot)  —  ^(^l,bot)}  >  ^min  and 


{7**2, top)  -  T(X i,tpp)}  <  *^max 

mx2Mt)  -  nxhhot)}  <  ‘^max 
(nX4.top)  -  r(X3,top)} 

>  ^max 

{  7^*4, bot)  —  T(X2  bot)}  ^  ^max 

{71*4, top)  -  71*1, top)}  <  ^max 
{r(x4,bot)  -  7xx1>b0t)}  <  ^max 


The  navigation  line  in  the  bottom-left  tracking  window  is  reset  by  Eq.  (10). 
5.  Bottom-right  tracking  window  is  no  good,  i.e., 


{7\*2,top)  —  T^Xptop)}  >  ^min  <3nd 
{7\*2,bot)  —  T{X\  ,bot)}  >  ^min  and 
{7\*4,top)  —  7^(X3  top)}  <  ^min 
{7\*4,bot)  —  7\2f3  bot)}  <  ^min 
{7^(*2,top)  —  7^(X3  top)}  >  ^min  and 
{7\*2,bot)  —  7X*3,bot)}  >  ^min  and 


{n*2,top)  -  7XXiftop)}  <  a>ma  x 
{r(x2,bot)  -  nx  i,bot)}  <  *^max 
{T(x4,top)  -  r(x3,top)}  >  ^max 
{7^(*4,bot)  —  7^(2f3  bot)}  >  ^max 
{n*2,top)  -  T(XX top)}  <  ^niax 

{7’(X2,bot)  -  ^Xs.bot)}  <  ^niax 


(12) 


(13) 


(14) 


(15) 


The  navigation  line  in  the  bottom-right  tracking  window  is  reset  by  Eq.  (10). 
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3.5.  Obtaining  a  guidance  directrix 


The  guidance  directrix  in  the  image  coordinate  system  (ICS)  can  be  presented  by  an 
image  trajectory  matrix  as  (Han  et  al.,  2001): 


P1 

r Uk 

x\ 

1 4k 

yL 

T1  — 
1tk  ~ 

P1 

i4k 

= 

x1 

Xi4k 

yL 

P1 

_  m4k  - 

X1 

L  xm,tk 

y>n,tk  _ 

(16) 


where  Tjk  is  the  image  trajectory  matrix  at  time  t  —  4;  Plf  =  [xjt  yjt  ]  is  the  position  of 
the  ith  point  on  the  guidance  directrix;  and  m  is  the  total  number  of  points  on  the  guidance 
directrix. 

For  simplicity,  five  (m  =  5)  points  were  selected  and  included  in  the  image  trajectory 
matrix  from  the  following  equation  (Fig.  6): 

[*1>  y\i  —  [5(^1, top  T-  ^2, top)?  l^l, top] 

[*5,  ^5]  =  [5(^1, bot  +  ^2, bot),  bot] 

[X3,  b]  =  [5(^1  +  *5),  2O1  +  w)]  O7) 

[X2,yi]  =  [j(Xl  +X3),  \(y\  +J3)] 

[X4,  y4 ]  =  [\(XJ,  +x5),  \(yi  +  y5)] 


Only  two  navigation  lines  in  the  upper  region  were  used  for  guidance.  The  navigation  lines 
in  the  lower  region  were  primarily  used  for  improving  the  tracking  reliability. 


4.  Results  and  discussion 

In  order  to  evaluate  the  performance  of  the  image  processing  procedure  under  outdoor 
conditions,  the  vision  sensor  (camera)  was  mounted  at  the  front  of  the  tractor  and  was  tilted 
about  30°  from  normal  in  the  direction  of  travel.  A  6  mm  lens  was  used  for  the  camera.  A 
300-MHz  Pentium  II  computer  operating  under  Windows®  98  was  installed  in  the  tractor 
cab  for  image  acquisition  and  processing.  In  the  following  paragraphs,  the  quantitative 
analysis  of  the  image  processing  procedure  is  provided  using  two  sets  of  test  images. 

4.1.  Images  from  soybean  crop 

A  series  of  30  images  were  taken  in  a  soybean  field  when  the  crop  was  about  8  cm  (3  in.) 
tall  at  a  V3  development  stage.  The  images  were  acquired  under  bright  sunlight  conditions. 
The  soybean  crop  had  a  continuous  row  structure  at  this  stage.  Visually,  the  crop  rows  can 
be  easily  separated  from  the  soil  background.  Fig.  7a  1  and  bl  shows  two  example  images 
from  this  data  set. 
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HU.  ASUM 


(b2).  image  b  after  K- means 
segmentation  (K=3) 


Fig.  7.  Two  example  soybean  images:  (al)  and  (bl)  are  the  original  images;  (a2)  and  (b2),  (a3)  and  (b3)  show  the 
results  after  K- means  segmentation  when  K  =  3  and  5,  respectively. 


The  image  size  from  the  frame  grabber  was  256  x  243  (W  x  H).  In  order  to  include 
two  (and  only  two)  crop  rows  in  the  ROI,  the  tracking  window  sizes  were  set  to  55  x  75 
for  the  bottom  two  windows  and  45  x  75  for  the  top  two  windows  (Fig.  7).  Trajectory 
matrices  for  both  the  upper  region  and  the  lower  region  were  calculated  by  the  procedure 
described  in  Section  3.5.  The  two  matrices  were  compared  with  the  reference  trajectory 
matrices  obtained  by  manually  (visually)  selecting  the  crop  rows  from  the  image.  An  offset 
error  for  any  point  in  the  trajectory  matrix,  (x;,  y;),  is  defined  by 

E,  =  X,  -  A  (18) 

where  x®  is  the  corresponding  x-coordinate  in  the  reference  trajectory  matrix  at  location 
y  =  yt.  Three  points  in  each  of  the  two  trajectory  matrices,  i.e.,  the  top,  middle,  and  bottom 
points  (Eq.  (17)),  were  selected  to  quantify  the  offset  error  of  an  entire  image. 

A  cost  was  calculated  by  Eq.  (9)  for  each  tracking  window  of  an  image.  The  average 
of  four  costs  associated  with  four  tracking  windows  of  an  image  was  used  to  quantify  the 
cost  of  the  entire  image.  The  offset  errors  and  costs  for  each  soybean  image  are  shown  in 
Table  1.  The  average  RMS  offset  error  for  this  data  set  is  0.93,  which  translates  to  a  ground 
distance  of  1.0  cm  in  the  center  of  the  image.  The  average  cost  for  this  data  set  is  4.99. 

It  was  found  that  the  number  of  classes,  K ,  in  the  K- mean  clustering  algorithm,  had  a  sig¬ 
nificant  impact  on  the  performance  of  the  proposed  procedure.  Fig.  8  shows  the  relationship 
between  the  average  RMS  offset  error  (for  the  entire  data  set)  and  the  number  of  classes, 
K ,  in  the  K- means  algorithm.  The  RMS  offset  error  was  reduced  from  3.17  when  K  —  3  to 
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Table  1 

Offset  error  and  costs  for  soybean  images  ( K  =  5) 


Image  ID 

Offset  error  (pixel) 

Costs 

RMS 

Maximum 

Mean 

Maximum 

1 

1.16 

1.75 

4.18 

5.32 

2 

0.82 

1.51 

5.78 

6.50 

3 

1.22 

2.07 

4.52 

5.75 

4 

1.43 

2.99 

5.10 

5.89 

5 

0.91 

1.30 

4.72 

7.44 

6 

0.75 

1.24 

5.22 

6.79 

7 

0.77 

1.67 

4.73 

6.14 

8 

0.57 

1.19 

5.77 

7.57 

9 

0.78 

1.16 

5.25 

6.29 

10 

0.78 

1.50 

4.71 

5.49 

11 

0.88 

1.61 

4.65 

5.62 

12 

1.01 

1.84 

4.36 

5.72 

13 

1.50 

3.15 

5.43 

6.23 

14 

0.61 

0.88 

4.48 

5.98 

15 

0.89 

1.61 

4.58 

4.97 

16 

0.64 

1.22 

5.11 

5.92 

17 

0.69 

1.29 

4.78 

5.57 

18 

0.67 

1.15 

4.55 

5.33 

19 

0.77 

1.46 

4.84 

6.30 

20 

0.46 

0.92 

4.80 

5.70 

21 

1.09 

1.74 

4.82 

6.28 

22 

0.76 

1.53 

5.00 

7.16 

23 

1.48 

2.32 

5.13 

6.43 

24 

1.83 

3.37 

5.56 

8.42 

25 

0.33 

0.66 

5.14 

7.03 

26 

0.31 

0.46 

5.32 

7.50 

27 

0.42 

0.85 

5.59 

6.43 

28 

2.24 

4.65 

6.31 

7.63 

29 

0.82 

1.21 

5.06 

6.74 

30 

1.16 

1.76 

4.26 

5.07 

Average 

0.93 

1.67 

4.99 

6.31 

0.81  when  K  —  l.  Further  increasing  K  caused  too  many  crop  pixels  to  be  segmented  out. 
Fig.  7a2  and  a3,  b2  and  b3  illustrates  the  effects  of  K  on  the  image  processing  quality. 

4.2.  Images  from  corn  crop 

A  series  of  15  images  were  taken  in  a  corn  field  when  the  crop  was  about  20  cm  (3  in.) 
tall  at  a  V4  development  stage.  Again,  the  images  were  acquired  under  bright  sunlight 
conditions.  The  crop  was  too  small  to  form  a  continuous  row  structure  at  this  stage.  In 
addition,  crop  development  was  not  uniform,  resulting  in  small  plants  and/or  no  plants  at 
many  locations.  Although  crop  rows  were  visually  identifiable,  the  non-continuous  row 
structure  posed  a  great  challenge  for  reliable  image  processing.  Fig.  9al  and  bl  shows  two 
example  images  from  this  data  set. 


RMS  Offset  Error  (pixel) 
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Fig.  8.  Relationship  between  the  average  RMS  offset  error  and  the  number  of  classes  for  soybean  images. 


(a2).  image  c  after  K-means 
segmentation  (K=3) 


Fig.  9.  Two  example  corn  images:  (al)  and  (bl)  are  the  original  images;  (a2)  and  (b2),  (a3)  and  (b3)  show  the 
results  after  /f-means  segmentation  when  K  =  3  and  5,  respectively. 
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Table  2 

Offset  error  and  costs  for  corn  images  ( K  =  5) 


Image  ID 

Offset  error  (pixel) 

Costs 

RMS 

Maximum 

Mean 

Maximum 

2 

2.25 

4.75 

6.59 

8.37 

3 

1.10 

2.41 

6.76 

7.44 

4 

2.64 

5.77 

6.41 

8.21 

5 

0.53 

0.93 

5.52 

7.55 

6 

1.23 

1.57 

6.36 

8.39 

7 

1.78 

3.13 

7.29 

8.07 

8 

3.52 

7.67 

8.52 

12.12 

9 

2.30 

3.70 

6.55 

7.92 

10 

1.37 

1.89 

6.82 

8.56 

11 

4.71 

10.31 

9.16 

14.99 

12 

2.60 

5.76 

6.62 

7.97 

13 

1.00 

2.00 

6.55 

7.66 

14 

2.48 

5.28 

7.56 

10.54 

15 

2.78 

4.68 

8.67 

9.34 

16 

3.39 

6.08 

9.64 

11.60 

Average 

2.25 

4.40 

7.27 

9.25 

The  image  size,  tracking  window  sizes,  and  the  evaluation  process  were  the  same  as  for 
the  soybean  data  set,  as  discussed  in  the  previous  section.  The  offset  errors  and  costs  for 
each  corn  images  are  shown  in  Table  2.  The  average  RMS  offset  error  for  this  data  set  is 
2.25,  which  translates  to  a  ground  distance  of  2.4  cm  in  the  center  of  the  image.  The  average 
cost  for  this  data  set  is  7.27.  These  numbers  are  much  larger  than  for  the  soybean  data  set, 
indicating  the  difficulty  in  correctly  identifying  crop  rows. 

The  relationship  between  the  average  RMS  offset  error  (for  the  entire  data  set)  and  the 
number  of  classes,  K ,  in  the  K- means  algorithm  is  shown  in  Fig.  10.  The  RMS  offset  error 
was  slightly  reduced  from  2.34  when  K  =  3  to  2.20  when  K  =  6.  However,  the  effect 
of  K  on  the  RMS  offset  error  was  negligible  for  this  data  set.  Fig.  9a2  and  a3,  b2  and  b3 
illustrates  the  effects  of  K  on  the  image  processing  quality. 


5.  Conclusions  and  future  work 

The  image  processing  procedure  with  guidance  directrix  approach  achieved  good  accu¬ 
racy  from  two  test  data  sets:  1.0  cm  average  RMS  offset  error  for  a  set  of  soybean  images 
and  2.4  cm  average  RMS  offset  error  for  a  set  of  corn  images.  The  procedure  is  acceptable 
for  real-time  vision  guidance  applications  in  terms  of  its  accuracy. 

/T-means  algorithm  in  the  procedure  is  the  major  limitation  to  the  processing  speed.  Using 
the  above-described  hardware,  images  were  collected  and  processed  at  a  rate  of  6-10  Hz, 
depending  on  image  scene  complexity.  This  processing  speed  is  acceptable  for  real-time 
applications  if  the  controller  output  rate  is  faster  and  is  independent  of  the  image  update 
rate.  With  our  implementation,  the  controller  update  rate  is  50  Hz  so  the  image  processing 
speed  was  deemed  acceptable. 
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Fig.  10.  Relationship  between  the  average  RMS  offset  error  and  the  number  of  classes  for  corn  images. 


Although  the  results  from  the  above  two  sets  of  test  images  were  favorable,  the  perfor¬ 
mance  of  the  proposed  image  processing  procedure  could  degrade  significantly  under  many 
adverse  environmental  conditions  such  as  weeds  or  crop  residue  in  the  ROI.  Quantitative 
evaluation  of  the  algorithm  under  these  conditions  was  not  performed.  However,  using  four 
tracking  windows  should  increase  the  robustness  of  image  processing  since  the  navigation 
lines  in  one  or  two  poor-quality  tracking  windows  could  be  calculated  from  the  lines  in 
other  good-quality  tracking  windows. 

The  proposed  image  processing  procedure  has  been  implemented  on  a  vision-based  guid¬ 
ance  tractor.  However,  quantitative  performance  evaluation  of  the  automatically  guided 
tractor  is  yet  to  be  done.  The  fusion  of  vision  sensor  with  other  navigation  sensors,  such  as 
GPS,  to  provide  more  robust  guidance  directrix  should  be  an  active  research  topic  in  the 
future. 
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